With spatial trajectory analysis SPATA introduces a new approach to find, analyze and visualize differently expressed genes and gene-sets in a spatial context. While the classic differential gene expression analyzes differences between experimental groups as a whole it neglects changes of expression levels that can only be observed while maintaining the spatial dimensions. Spatial trajectories allow to answer questions that include such a spatial component. E.g.:
1.) In how far do expression levels change the more one moves towards a region of interest or away from it?
2.) Which genes follow the same pattern along these paths?
This tutorial is split into three parts. The first part explains the transformational process that has to take place in order to equip the spatial expression values with a directional component. It explains what happens in the background of createTrajectories() and plotTrajectoriyGenes(), -GeneSets(), -Features(). The second part is about the modeling and screening of trends and dynamics along trajectories (assessTrajectoryTrends()). The third part exemplifies how the concept of spatial trajectory analysis can be deployed to complement the classic DE-analysis and to derive deeper insights into the sample at hand.
Figure 1.1 displays the surface plot of a glioblastoma sample that is used throughout this chapter as an example. The plot is colored according to the expression values of the gene-set Hallmark Hypoxia. Next to it the example trajectory is displayed which runs towards the hypoxic area and is named accordingly ‘towards-hypoxia’.
Figure 1.1: Surface- and trajectory plot colored according to the ‘Hallmark-Hypoxia’ gene-set expression values
The process of setting up a trajectory can be splitted into four steps:
The chapters below elaborate on each of these four steps.
In order to draw the trajectory a surface plot is required. Extracting the coordinates of all barcode-spots via getCoordinates() provides the basic data.frame needed.
Within the interface that is accessible through the function createTrajectories() the start- and endpoint of the trajectory are determined and stored in a data.frame like the one shown below. A trajectory can be composed of several parts which would be the case if the endpoint of the first part is determined as the starting point of the next part and so on. However, the example trajectory in this chapter is composed of only one part. Thus the data.frame contains only one observation.
Mapping the x- and y-coordinates of the barcode-spots to the respective x- and y-aesthetics results in a basic surface plot. The data.frame above contains information about start- and endpoint of the trajectory which can be used to plot the trajectory’s course.
Figure 1.2: Basic surface plot and the example trajectory’s course
A trajectory includes only a subset of barcode-spots. To determine the barcode-spots that fall into the trajectory’s reach the trajectory’s width needs to be determined which can be done manually within the interface of createTrajectories() - the default is 20.
With that value at hand two additional vectors can be computed that each cross the trajectory’s start- and endpoint orthogonally. Start- and endpoint of these crossing vectors determine the vertices of the rectangle that includes all the barcode-spots that fall into the trajectory’s reach. The higher the value for the trajectory’s width the wider it gets and the more barcode-spots it includes.
Figure 1.3: The trajectory’s reach for different trajectory widths
The function point.in.polygon()of the sp-package allows to filter the data.frame that contains information about all barcode-spots for those spots that fall into the polygon and thus in the trajectory’s reach.
Figure 1.4: Highlighted are the barcode spots that fall into the trajectory’s reach
The following chapters continue with the trajectory whoose width was set to 20.
In order to integrate the direction of the trajectory as an informative variable every barcode-spot needs to be set into relation to the trajectory’s course. The closer a barcode-spot is located towards the trajectory’s origin the smaller the respective value needs to be.
This is achieved by projecting the barcode-spots position onto the vector that corresponds to the trajectory’s course. The local coordinate system of Figure 5 is build around all relevant barcode- spots whereby it’s x- and y-axis correspond to the trajectory’s width (local width axis) and the trajectory itself (local length axis, lla), albeit displaced towards the edge.
The local length axis inherits it’s direction and course from the original trajectory. In order to arrange the barcode-spots according to the trajectory’s direction their projection onto the local length axis needs to be calculated via vector-projection which requires two main steps.
1.) Calculate vector a that connects the barcode-spot of interest to the origin of the local coordinate system. Then project vector a onto the local length axis (lla) such that \(\vec{proj} = \frac{\vec{sto} * \vec{traj}}{||\vec{traj}||^2} * \vec{traj}\)
2.) Calculate the magnitude of the vector that corresponds to the projection: \(proj.length = |\vec{proj}|\)
The higher the calculated projection length the higher the distance between the barcode-spot and the origin of the trajectory.
Figure 1.5: Projecting every barcode-spot orthogonally onto the local length axis of the local coordinate system.
The barcode-spots are now arranged according to their projection onto the trajectory. However, plotting all barcode-spot’s expression values against the trajectory’s direction requires to first bin them into groups of spots with simultaneous projection_length-values. Skipping that would result in a plot corresponding to a trajectory drawn in the figure below.
Figure 1.6: Current arrangement (corresponding to the PL-values) of all relevant barcode spots resembles a serpentine course
By rounding the barcode-spot’s projection_length-values with a given binwidth they are sorted into bins for which the average expression values are calculated and in return plotted against the trajectory’s direction. For visualization purpose the binwidth is increased to 40 - the default is 5 which results in way smaller groups.
Figure 1.7: Visualization of the desired (averaged) arrangement
Using the barcodes-variable as the key variable expression-values (or any other kind of numeric variable) can be joined to the data.frame containing the arranged and grouped barcode-spots as shown in the data.frame below.
The data.frame above contains an additional variable HM_HYPOXIA (full gene set name: Hallmark Hypoxia) corresponding to the hypoxia related expression-values each barcode-spot features. These expression values are then averaged groupwise whereby the barcode-spots group belonging is determined by the previously generated variable pl_binned. The summarized data.frame below contains a new variable trajectory_order which effectively renames the values of pl_binned and eventually determines the order in which the averaged expression values are plotted.
The summarized, averaged values of every group can now be plotted against the trajectory’s direction.
Figure 1.8: The hypoxic related expression trend of all barcode spots within the defined trajectory in a one dimensional lineplot
Once a trajectory is drawn in the light of a biologic question or finding dynamics of additional variables can be easily visualized and investigated one by one. However, given that the average expression matrix contains tens of thousands of genes investigating all genes and gene-sets manually can quickly become a cumbersome task.
Therefore, apart from functions that visualize the trends of variables along drawn trajectories SPATA offers a way to screen all genes, gene-sets and features for those that follow certain trends representing biologically relevant dynamics.
How well a certain trend fits the dynamic of gene or gene-set along a trajectory is evaluated by calculating the area under the curve of the fitting’s residuals. The example below visualizes that by reference to the expression dynamics of GFAP.
Figure 2.2: Trajectory trend assessment exemplified by the dynamics of GFAP along the trajectory ‘towards-hypoxia’.
The resulting data.frame is composed of three columns.
Every observation corresponds to the evaluation of one trend fitted to the dynamic of one variable.
The trajectory used in the examples above was drawn in the light of a hypoxic high area. Assuming that one does not randomly stumble upon eye catching areas like that it is usually clustering that finds such areas of interest and subsequent DE-analysis that analyses them. Figure 2.1 shows the results of different cluster algorithms that differ slightly in their outcome but without exception identify a cluster that with respect to it’s location corresponds to the hypoxic high area as well as an adjacent area that surrounds the former.
Figure 3.1: Different clustering algorithms identify an area corresponding in it’s location to the hypoxic-high area.
After defining a hypoxic area and an adjacent area differential expression analysis finds the genes that are exclusively upregulated displayed in the heatmap below.
Figure 3.2: DE-results, top 20 differential expressed genes.
DE-Analysis does a decent job finding these differently expressed genes. However, it is confined in it’s capabilities to reflect the biologic dynamics between two or more areas of interest as it neglects spatial resolution Barcode-spots that are located at the border of two adjacent areas that are being compared might be similar in their expression levels of certain genes or completely different regardless of the DE-Analysis results which always refer to areas as a whole. The underlying dynamic of a differently expressed gene or gene-set between two areas of interest might be for example gradual, steep or abrupt. This might affect the conclusion one has to draw from the findings of DE-Analysis.
Figure 3.3: Genes found by DE-analysis that, however, behave differently.
The trajecotry has been drawn in such a way that it crosses the border between adjacent and hypoxic area roughly at it’s center. Therefore, filtering genes for those that follow the Late peak or One peak dynamic along the drawn trajectory gives answer to questions like “What happens at the border of the hypoxic area?” and “What happens closer to the core?”.
Figure 3.4: Heatmaps that each display the top 15 genes found by the trajectory screening following the respective dynamic.
Among others the gene METRN seems to peak at the border. However, it was not among the top 20 marker genes the DE-analysis returned. A closer look reveals that METRN which regulates glial cell differentiation and which promotes the formation of axonal networks during neurogenesis is actually highly expressed around the hypoxic area.
Figure 3.5: Visualizes the expression pattern of METRN corresponding to the adjacent-area.
Spatial trajectory analysis and screening might therefore be a useful tool to complement spatial DE-analysis.